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Abstract 

We present a Markov chain (Dikin walk) for sampling from a convex body equipped with 
a self-concordant barrier, whose mixing time from a "central point" is strongly polynomial 
in the description of the convex set. The mixing time of this chain is invariant under afHne 
transformations of the convex set, thus eliminating the need for first placing the body in an 
isotropic position. This strengthens and previous results of [11] for polytopes and generalizes 
these results to arbitrary convex sets. In the case of a convex set K defined by a semidefinite 

^ constraint of rank at most a and at most m additional linear constraints, our results specialize 

^ I to the following statement. 

<^ Let s > M for any chord pq of K passing through a point x e K. Then, after 

^ / / 1 

t = O I n{m + na) I n ln((m + na)s) + In - 

steps are taken by a Dikin walk starting at x, the total variation distance and the £2 distance 
of the density p{xt) of the point to the uniform density are less than e. 

On every convex set of dimension n, there exists a self-concordant barrier whose "complexity" 
O is polynomially bounded. Consequently, a rapidly mixing Markov chain of the kind we describe 

can be defined on any convex set. We use these results to design an algorithm consisting of a 
single random walk for optimizing a linear function on a convex set. We show that this random 
^ walk reaches an approximately optimal point in polynomial time with high probability and 

that the corresponding objective values converge with probability 1 to the optimal objective 
value as the number of steps tends to infinity. One technical contribution is a family of lower 
bounds for the isoperimetric constants of (weighted) Riemannian manifolds on which, interior 
• point methods perform a kind of steepest descent. Using results of Barthe [2] and Bobkov and 

I Houdre [5], on the isoperimetry of products of (weighted) Riemannian manifolds, we obtain 

sharper upper bounds on the mixing time of Dikin walk on products of convex sets than the 
bounds obtained from a direct application of the Localization Lemma, on which, the analyses 
^ of all random walks on convex sets have relied since (Lovasz and Simonovits, 1993). 

• ^ 

X 

H 1 Introduction 

Sampling from a nearly uniform distribution on a high dimensional convex set is an important 
ingredient in several computational tasks, including computing its volume [6, 17] and sampling 
from lattice points in it [12]. The usual strategy for doing so is to design a rapidly mixing Markov 
chain whose stationary distribution is the uniform distribution, and then run it for sufficiently long 
and pick the final point as a sample. The mixing times of all known Markov chains for sampling 
generic convex sets depend on the aspect ratio of the set, a measure of which is the ratio between 
its diameter and width. 

Generalizing prior work with Kannan on polytopes [11], we present a Markov chain for sampling 
from a convex body defined using a combination of linear, hyperbolic and self-concordant constraints 
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(by which we mean constraints corresponding to which there is a logarithmic, hyperbolic or self- 
concordant barrier respectively). When restricted to the case of polytopes, the bounds we present 
imply (upto miiversal constants), the bounds in (Kannan and Narayanan, [11]). The mixing time of 
this chain is an affine invariant, thus eliminating the need for first placing the body in an isotropic 
position. A self-concordant barrier F on a convex set K, is a convex function whose domain is the 
interior of that tends to infinity as one approaches its boundary, and whose second derivative 
at a point along any unit direction is large in a suitable sense compared to its first and third 
derivatives along the same vector. In order to convey the basic idea, let D'^F{x) be the Hessian 
matrix of F at x. We define the transition measure Px corresponding to x to roughly be a Gaussian 
whose covariance matrix is a fixed multiple of (^D^F{x)^ . The properties of the barrier function 
cause the random walk to avoid the boundary, but at the same time take relatively large steps. For 
example, let K be the 2-dimensional Euclidean ball {x : \\x\\ < R} and F{x) := — ln[R^ — Hxp) 
be a self-concordant barrier for it. Then, for x £ K, up to constants, the expected magnitude of 
the component of a step in the radial direction is R — \\x\\, while the expected magnitude of the 
component in the transverse direction is roughly R"^ — ||a^P. We see that the mixing time from 
is independent of the diameter 2R, and that the size and typical orientation of a step vary according 
to the local geometry. 

We use this random walk to design an algorithm for optimization, which essentially consists 
of doing such a random walk on a projectively transformed version of K. This transformation 
preferentially dilates regions corresponding to a larger objective value, causing them to occupy 
more space and hence become the target of a random walk. In the case of polytopes, a slightly 
different version of this appeared in [11]. The Markov chain considered in [11] was ergodic, while 
the one we use here is not. The analysis of the non-ergodic Markov chain hinges upon the fact that 
it can be viewed as a limit of ergodic Markov chains. 

1.1 Barrier oracle model for convex sets 

There are two standard information models for convex sets in the operations research literature, the 
separation model and the (self-concordant) barrier model (See Freund [7], page 2). Existing work 
on sampling convex sets, with the exception of (Kannan and Narayanan [11]) has focussed on the 
separation model and a weaker model known as the membership oracle model. The self-concordant 
barrier model we will consider is the following. 

1. We are guaranteed that the origin belongs to K and that K has a self-concordant barrier F 
with parameter v (see Section 2). 

2. We are given a real number s such that for any chord pq of K through the origin, |^ < s. 

3. On querying a point x G M", we are returned a positive semidefinite matrix corresponding to 
the Hessian of F if x £ K and returned "No" if x ^ K. 

1.2 Implementing the barrier oracle in the linear and semidefinite cases 

The most frequently encountered barrier functions encountered are the logarithmic barrier for poly- 
topes and the logdet barrier for convex sets defined by semidefinite constraints (See Section 2). We 
discuss the implementation of the barrier oracle for the logarithmic barrier below, in the case where 
x is in the set. Let Ki be the set of points satisfying the system of inequalities Ax < 1. Then, 
H{x) = D{x)'^A where D{x) is the diagonal matrix whose i^^ diagonal entry da^x) = ^_^t^ ■ 

By results of Baur and Strassen [3], the complexity of solving linear equations and of computing 
the determinant of an n x n matrix is 0{n'^'). The computation of A^D{x)'^A can be achieved using 
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mn' arithmetic operations, by partitioning a padded extension of A D into < square 
matrices. Thus, the complexity of the barrier oracle is 0{mrC~^) arithmetic operations where 
7 < 2.377 is the exponent for matrix multiplication. 

In the case of a semidefinite constraint of rank z/, the number of arithmetic steps needed for 
computing the Hessian of the log det barrier is 0(n2z/2 + ni/'^), (see Section 11.3, [25]. We have 
replaced an exponent 3 in [25] with 7). Given the Hessian, it can be inverted in {n^) arithmetic 
steps. This is needed to implement one step of the Dikin walk. 

1.3 Presentation of the convex set K 

For the definitions of logarithmic, hyperbolic and self-concordant barriers, we refer to Section 2. 
We will assume that the convex set K is specified as the set of points that satisfy a family of 
constraints 

m 

K := f]{Fi{x) < 00} 

i=l 

where the Fi are either logarithmic, hyperbolic or arbitrary self-concordant functions. Without 
loss of generality, we may aggregate these barriers and may assume that K := Ki (1 H Kg, 
where Ki is a polytope with m faces accompanied by the logarithmic barrier is a convex set 

accompanied with a hyperbolic barrier F with parameter z^/j, and Ks is a convex set accompanied 
by a z^s— self-concordant barrier F. Although their intersection is bounded, each of these convex 
sets may be unbounded. Define the self-concordant barrier function 

F ■=F + nF + n^F, 

and define 

1/ ■= rn + nuh + {nugf (1) 

to be the complexity parameter of F (which is different from its self-concordance parameter; this 
being m + y/nvh + nvg ). Let C be a sufficiently large universal constant. We define the radius of 
a Dikin step, r to be 1/C For a point x ^ K and ?; G M", we define 

\\v\\l:=D^F{x)[v,v]- 

The random walk we use here is a variation of the Dikin walk defined in [11], in which instead of 
picking the next point from a Dikin ellipsoid, one picks it from a Gaussian having that covariance. 

1.4 Related Work 

Let -B(x, r) be defined to be the n— dimensional Euclidean ball of radius r centered at x and suppose 
K \s a. n-dimensional convex set such that 5(0, r) C C B{0,R). The Markov chain known as 
the "Ball walk" [17, 10] is defined as follows. If the random walker is at a point Xj in a convex 
body K at time step i, a random point z is picked in B{xi,0{^)), and Xj+i is set to z if it lies 
in K, otherwise the move is rejected and Xj+i is set to Xj. The mixing time of this walk from a 
warm start (i.e. a density that is bounded above by 0(1) times the stationary density) in order 
to achieve a constant total variation distance to stationarity is O* ^ "^^^ ^ . However, for no single 

pre-specified point (such as the center of mass, as opposed to a random one) is it known to mix in 
polynomial time. More recently, a random walk known as Hit-and-Run, was analyzed in [16, 18]. 
If the random walker is at a point Xj in a convex body K at time step i, a vector is picked from the 
uniform distribution on the sphere and through Xj, and Xj+i is chosen from the uniform measure 
on the chord {xj-|-A?;|A G Mjniir. Unlike the Ball walk, this walk provably mixes rapidly from any 
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interior point, with a weak (logarithmic) dependence on the distance of the starting point from the 
boundary. From a warm start, the mixing time of Hit-and-Run is O ^, and its mixing time 

from a fixed point at a distance d from the boundary is O {n^i^)'^ In The mixing time of the 
Dikin walk in two cases of interest are as follows (details are provided in Theorem 2). Let x G K 
and for all chords pq passing through x, \^Z^\ ^ s. We will call such a point x s— central. Suppose 
K is 

(S) a slice of the semidefinite cone S^^^ oi v ^ v matrices endowed with the hyperbolic barrier 
F[x) = — Indetx or 

(Q) the intersection of m = | ellipsoids, A\B n A2B n • • • n AmB where Ai are non-singular 
affine transformations and B is the Euclidean Ball. In this case, the hyperbolic barrier is 
F{x) = -EHl-\\A-\x)r). 

Then the mixing time starting at x is 0(n^i^(n In(z^s) + In^)). Whether or not the bodies are 
in isotropic position, in the above cases (S) and (Q) corresponding to semidefinite and quadratic 
programs, when = 0{n^~''), the mixing time bounds are an improvement over the existing bounds 
for Hit-and-Run [18]. 

If K is defined by semidefinite constraints, from a point x £ K, one step for Hit-and-Run 
requires Q{log{R/d)) membership operations, each of which requires testing the semidefiniteness of 
a X matrix (which takes 0{i''^) arithmetic steps), where R is the radius of a circumscribing ball, 
and d is the distance of x to the boundary of K. Convex sets defined by semidefinite programs 
can be very ill-conditioned, and the best possible a priori upper bound on log ^ is not less than 
where L is the total bit-length of rational data defining K and the point [28]. In the general setting, 
the number of arithmetic operations needed for implementing a Dikin step would be independent 
of R/r, but would depend on two affine-invariant quantities - the parameter associated with the 
barrier and log s, where the starting point is s— central. In ill-conditioned semidefinite programs, 
log s can be exponential in the bitlength, but for special points it can be much smaller; for example, 
for the center of mass and or the analytic center, it is O(logn) and O(logz^) respectively. 

Lovasz [16] proved a lower bound of Q.in'^p^) on the mixing time of Hit-and-Run in a cylinder 
Bn X [— P)P] from a warm start, where Bn is the unit ball in n— dimensions. Dikin walk has a 
mixing time of 0{n?) from a warm start. Thus for a cylinder with p = a;(l), the lower bound on 
the number of steps needed for Hit-and-Run to mix (without rescaling the body) is larger than the 
upper bound on the number of steps for Dikin walk. 

More generally, we can compare the upper bounds on the number of (barrier) oracle calls needed 
to generate a point from an convex set K whose total variation distance from the uniform is e, 
when the starting point is at a distance ry from the boundary. We assume that the barrier F whose 
complexity parameter is z^. For Hit-and-Run, the number of oracle calls is 

O ( n'^^ln — 



For Dikin walk, this is 



Tiiy 1 
O { nv { n ln( — ) + In - 



The ratio between the bounds for number of oracle calls for Dikin walk and the number of oracle 
calls for Hit-and-Run is O* 



In the specific case where the constraints are either semidefinite or linear, we can compare the 
upper bounds on the number of arithmetic operations needed in Hit-and-Run and Dikin walk. 
Suppose K as above that K is an convex set and the starting point is at a distance rj from the 
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boundary, and that it is defined by m linear constraints and additionally, semidefinite constraints 
of total rank a (which can be as low as 0(1), e.g. for the intersection of a constant number of 
ellipsoids). Then, the number of arithmetic steps for implementing one Dikin step is 

O (^mn'^~^ + n^a^ + na"^) , 

by the discussion in Subsection 1.2. For Hit-and-Run, the number of arithmetic steps needed to 
make one move in a naive implementation is O* (log{R/r){mn + na^ + a'^)) , (since the natural way 
of certifying positive semidefiniteness is to take a Cholesky factorization, which has a complexity 
0{a'^), computing the new semidefinite matrix after one step has a complexity n(a^) (Section 11.3, 
[25]) and testing containment in the region defined by linear constraints takes 0{nm) operations). 
We see that 

1. If m < na^ + , then the ratio between the number of arithmetic steps for one move of Dikin 
walk and one move of Hit-and-Run is not more than 0*{n). 

2. If m > na^ + , then the ratio between the number of arithmetic steps for one move of Dikin 
walk and one move of Hit-and-Run is not more than 0*{rr'^'^) < 0*{iiP'^^). 

Combining the arithmetic complexity of implementing one step of Hit-and-Run with the mixing 
time, the ratio between the number of arithmetic steps needed to produce one random point using 
Dikin walk to the number of arithmetic steps needed for producing one random point using Hit- 
and-Run is O* ^ ^'"^2"^'' ^ if m < na^ + oC and O* {^-^r^^^m~^ if m > nc? -|- . 

2 Self-concordant barriers 

Let IT be a convex subset of that is not contained in any n — 1-dimensional affine subspace and 
int{K) denote its interior. For any function F on int{K) having continuous derivatives of order k, 
for vectors /ii, . . . ,hk G M" and x G int{K), for k > 1, we recursively define 

T^kjp^ ,1 ,- D^~'^{x + ehk)[hi,...,hk-i]- D''-^{x)[hi,...,hk-i] 
D F{x)[hi, ...,hk\:= lim , 

e-S>0 e 

where D^F{x) := F{x). Following Nesterov and Nemirovskii, we call a real-valued function F : 
int{K) — 7- M, a regular self-concordant barrier if it satisfies the conditions stated below. For 
convenience, if x int{K), we define F{x) = 00. 

1. (Convex, Smooth) F is a convex thrice continuously differentiable function on int{K). 

2. (Barrier) For every sequence of points {xj} G int{K) converging to a point x int{K), 
limj^oo fixi) = 00. 

3. (Differential Inequalities) For all h G M" and all x G int{K), the following inequalities hold. 

(a) D'^F{x)[h, h] is 2-Lipschitz continuous with respect to the local norm, which is equivalent 
to 

D^F{x)[h,h,h] < 2{D^F{x)[h,h])L 

4. F{x) is z/-Lipschitz continuous with respect to the local norm defined by F, 

\D[F]{x)[h]\'^ < uD^[F]{x)[h,h]. 

We call the smallest positive integer 1/ for which this holds the self- concordance parameter of 
the barrier. 
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It follows from these conditions that if F is a self-concordant barrier for K and ^ is a non-singular 
affine transformation, then Fa{x) := F(A~^x) is a self-concordant barrier for AK. This fact is 
responsible for the affine-invariance of Dikin walk. Some examples of convex sets for which explicit 
barriers are known are 

1. Convex sets defined by hyperbolic constraints. This set includes sections of semidefinite cones. 
Polytopes and the intersections of ellipsoids can be expressed as sections of semidefinite cones. 

2. Sections of ip balls. 

3. Convex sets defined by the epigraphs of matrix norms (see page 199 of [23]). 

For other examples and methods of constructing barriers for new convex sets by combining existing 
barriers, see Chapter 5 of [23]. 

2.1 Hyperbolic barriers 

We refer the reader to [9] for the definition of a hyperbolic barrier. For the concrete applications in 
this paper, it suffices to note that on the semidefinite cone 5'™'^"^, — Indet x is a hyperbolic barrier 
with parameter m, and that on the intersection of ellipsoids, AiB D A2B n • • • n AmB where Ai 
are non-singular affine transformations and B is the Euclidean Ball, — ^^T^^i^ — \\A~^{x)\\'^) is a 
hyperbolic barrier with parameter 2m. 

Lemma 1 (Theorem 4.2, Giiler [9]). If F is a hyperbolic barrier, 

\D^F{x)[h,h,h,h]\ <6{D^F{x)[h,h])^ . 

2.2 Logarithmic barrier of a polytope 

Given any set of linear constraints {ajx < Ij^^Lx, the logarithmic barrier is a real valued function 
defined on the intersection of the halfspaces defined by these constraints, and is given by 

m 

F(x) = -^ln(l-afx). 

i=l 

2.3 Dikin Ellipsoids 

Around any point x G K, the Dikin ellipsoid (of radius r) is defined to be 

:= {y: D^F{x)[x-y,x-y]<r^}. 

Fact 1. Dikin ellipsoids are affine invariants in that, if the Dikin ellipsoid of radius r around 
a point X £ K is D^. and T is a non-singular affine transformation of K, the Dikin ellipsoid of 
radius r centered at the point Tx for T{K) is T{D!^), as long as the new barrier that is used is 
G{y) ■.= F{T-^y). 

Fact 2. For any y such that 

D'^F{x) [x — y, X — y] = < 1, 

for any vector h G M", 

(1 - rfD''Fix)[h, h] < D^Fiy)[h, h] < —^D''F{x)[h, h]. (2) 

Also, the Dikin ellipsoid centered at x, having radius 1, is contained in K. This has been shown in 
Theorem 2.1.1 of Nesterov and Nemirovskii [23]. 
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The following was proved in a more general context by Nesterov and Todd in Theorem 4.1, [26]. 

Theorem 1 (Nesterov- Todd). Letpq be a chord of a polytope P and x,y be interior points on it 
so that p, X, y, q are in order. Let be the Dikin ellipsoid of unit radius at x with respect to a 
point X. Then z G Dy implies that p + !^~^[ (z — p) G D^. 



For X £ int{K), let Gx denote the Gaussian density function given by 



3.1 Algorithm 1 

Let = xq £ int{K). For i > 1, given Xi^i, 

1. Toss a fair coin. If Heads let Xi := Xi-i. 

2. Else 

(a) Choose z from the density Gxi_^- 



(c) If z i^, let Xi := Xi-i. 



Theorem 2 (Sampling). Let K 3 Q be an n— dimensional convex set accompanied by a barrier F 
as in Subsection 1.3, with complexity parameter v. Let s > |^ for any chord pq of K containing 
the origin. Then, the number of steps xi,...,xt that need to be taken before the total variation 
distance and the C2 distance of the density p{xt) of xt to the uniform density is less than e is 



3 The Dikin walk 




where 




(b) lfz£K, let 




7 



K 



Figure 1: Trajectory of a Dikin walk for optimizing a linear function on a convex set 

O {nv (nln(z^s) + In . The number of steps needed from a warm start, i. e. when the Coo norm 
o//9(xo)(vol(K)) is 6(1), isO(ni/ln^). 

In particular, if K is 

(S) a slice of the semidefinite cone S'^^'^ of r x r matrices with F{x) = — Indetx or 

(Q) the intersection of r ellipsoids, AiB n A2B n • • • n At-B where Ai are non-singular affine 
transformations and B is the Euclidean Ball. In this case, F{x) = — X]ln(l — [[^^""^(x) 

the mixing time from a fixed "s— central" point or a warm start, respectively, are O [n^r [n In(nrs) + In ^) ) 
and O (n^T In \) . 

The mixing bounds in this paper are obtained by relating the Markov chain to the metric 
of Riemannian manifold studied in operations research [22, 27], rather than the Hilbert metric 
[11, 16]. This metric possesses several potentially useful characteristics. For example, when the 
convex set is a direct product of convex sets, this metric factors in a natural way into a product 
of the metrics corresponding to the individual convex sets, which is not the case for the Hilbert 
metric. Using results of Barthe [2] and Bobkov and Houdre [5] on the isoperimetry on product 
manifolds, this leads to an improved upper bound on the mixing time when K is a, direct product 
of convex sets, and opens up the future possibility of using differential-geometric techniques for 
proving isoperimetric bounds, in addition to relying on the Localization Lemma, which underlies 
the analysis of all Markov Chains on convex sets ever since it was introduced in (Lovasz and 
Simonovits [17]). Even if is a direct product of convex sets, the Dikin Markov chain itself does 
not factor into a product of Dikin Markov chains and Theorem 3 does not follow from a direct use 
of the Localization Lemma. 

Theorem 3. // an n— dimensional convex set K := Ki x • • • x is the direct product of convex sets 
Ki, each of which individually has a function Fi with a complexity parameter ( defined in Equation 1 ) 
at most K, then, the mixing time of Dikin walk from a warm start on K defined using the function 
Y!i=i Pi ^5 0{Kn). 

When there are 0,{n) factors, each of which is a polytope with k faces, the total number of 
faces of K is ^}{nK). In this case, the results of [11] using the logarithmic barrier, give a bound of 
0{Kn'^) while Theorem 3 gives a bound of 0{Kn). 
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4 Convex programming 



The mixing results can be adapted to give a random walk based polynomial-time Las Vegas al- 
gorithm for optimizing a linear function c^x on certain convex sets K. The complexity of this 
algorithm is roughly the same as that of the sampling algorithm. 

Unlike other random walk based algorithms ([4]) this algorithm does not proceed in phases, but 
consists of a single random walk xo,xi, . . . on K. The algorithm here is a Las Vegas algorithm 
rather than a Monte Carlo algorithm as was the case in [11]. It is also different in that the Markov 
chain used here does not depend on e, the error tolerance. 

We will consider convex programs specified as follows. Suppose we are given a convex set K 
containing the origin as an interior point, and a linear objective c such that 

Q■.= Kr^{y■.Jy<l} 

is bounded, for any chord pq oi Q passing through the origin, ||| ^ ■s and e, > (if 5(0, r) C C 
i?(0, R), then s < ^). Then, the algorithm is required to do the following. 

• If 3 X G X such that c^x > 1, 

• Output x' ^ K such that c^x' > 1 — e. 
Let T : Q ^ M" be defined by 

T(x) 



T 

C X 

1 — C^X ' 



and let F be a barrier for K := T{Q). Such a barrier can be easily constructed from F; details 
follow Theorem 4. 

Our algorithm for convex programming consists simply of doing a modified Dikin walk on K 
for a sufficient number of steps that depends on the desired accuracy e and confidence 1 — 6. We 
define Gt using F in the same way that Gt was defined using F; details appear below. 

4.1 Algorithm 2 

Let xq = 0. While c^Xj_i < 1 — e, 

1. Toss a fair coin. If Heads, set Xi := 

2. Else, 

(a) Choose z from the density GT{xi-i)- 

(b) Ifzek, let 



Xi := < 



T-i(z), with probability min (l, ^AT(x^^\ 

V '-T{:ri_i)(2^)y 



Xi-i otherwise. 



(c) If z ^ K, let Xi := Xi-i. 



Theorem 4 (Las Vegas algorithm for optimization). Let K,F, s and r he as in Theorem 2. In 
the cases where F is a v-harrier or a hyperbolic harrier with parameter v, let t{€,6) he set to 
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O [nv (in J + (nln ^))) • If {c^x > 1} n K is nonempty and xq,xi, ... is the modified Dikin walk 
in Algorithm 2, then 

P [Vi > r(e, 5), c^Xi > 1 - e] > 1 - 5. (3) 
Corollary 1. For any k > 0, with probability 1, 

lim e*^ "(1 - c^Xi) = 0. 



4.2 Constructing barriers 

The construction in [21], provides us with a barrier Fg on K, given by 

:= f A + f ?y V f rrV U 2.. l„(i + c-,) 



3^3 2^ V3y y V vi + c^y 

whose self-concordance parameter is < (3.08y^ + 3.57)^. If F = — Inp(x) where p is a hyperbolic 
polynomial of degree Fh is defined simply by 

Aiy) := F {j^^ + ln(l + c^y), 

and has the same self-concordance parameter v. This applies to the special case of the Logarithmic 
barrier as well. For any point x G int{K), we use the Hessian matrix D^F to define a norm 

:= {v^D'^Fv)'^. 

( II " ^ 
n \ 2 / n\\X — y\ 

where 



T/(x) = ( ^ ) lndetL>2F(x). 



For X int{K), for any y ^ x, Gx{y) '■= 0. Let 

S := sup T—r, 
pqBO \Q\ 

where the supremum is taken over all chords of K containing the origin. 

5 Metric defined by a barrier 

For any smooth strictly convex function G, the Hessian D^G is positive definite. Given the barrier 
F, for every x E supp{F) and u,v £ M", 

<u,v >x-= D'^F{x)[u,v] 

is bilinear, and \\u\\x = yj< u, u >x is a norm. In addition, we define 

1. <u,v >x-= D'^F{x)[u,v] and := \/ <u, u >x- 

2. <u,v >x'-= D'^F{x)[u,v\ and \\u\\x ■= \/ <u, u >x and 
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3. <u,v >x-= D'^F{x)[u,v] and \\u\\x := V <u, u >x- 
We define 

d{x,y) = inf / \\dr\\z 

^ J z 

where the infimum is taken over ah rectifiable paths T from x to y. Let Ai be the metric space 
whose point set is K and metric is d. di, dh and dg are defined analogously in terms of the respective 
norms || • ||, || • || and || • ||. 

Lemma 2 (Nesterov-Todd (Lemma 3.1 [27])). If \\x — y\\x < 1 then, 

\\x - y\\x - \\x - y\\l < d{x,y) < -ln(l - ||x - y\\x). 

While some of the presented bounds can be obtained from the isoperimetric bounds for the 
"Hilbert metric" (Theorem 6) proved by Lovasz, we can prove stronger results for sampling certain 
classes of convex sets such as the direct product of an arbitrary number of convex sets, by using 
results of Barthe [2] and Bobkov and Houdre [5] on the isoperimetry of product spaces, which do not 
seem to follow directly from the Hilbert metric. In particular, for a direct product of an arbitrary 
number of polytopes, each defined by 0{k) constraints, this allows us to show a upper bound 
on the mixing time from a warm start of 0{Kn). The bound obtained using the Hilbert metric 
in the obvious way is O(Kn^), since the Hilbert metric on a direct product does not decompose 
conveniently into factors as does the Riemannian metric. 

Riemannian metrics defined in this way have been studied because of their importance in convex 
optimization, for example, by Nesterov and Todd in [27] and by Nesterov and Nemirovski in [22], 
and Karmarkar studied the properties of a related metric [14] that underlay his celebrated algorithm 
[13]. For other work on sampling Riemannian manifolds motivated by statistical applications, see 
[15, 20], Chapter 8 [19]. 



5.1 Isoperimetry 

Let be a metric space endowed with distance function d and ^ be a probability measure on it. 
We term {Si^ M. \ Si \ S2, S2) a 5-partition of A^, if 

5 < dM{Si: S2) ■■= inf dMix,y), 

where Si, S2 are measurable subsets of M.. Let Vs be the set of all 5-partitions of Ai. The 
isoperimetric constant /3fat {S, Ai, fi) is defined as 

n{M\Si\S2) 

Vs KSl)^^{S2) ■ 

Given interior points x, y in int{K), suppose p, q are the ends of the chord in K containing x, y 
and p, X, y, q lie in that order. Denote by dn the Hilbert (projective) metric defined by 

dH{x,y) :=ln f 1 + '"^ " ^' 



\p-x\\q-y\ 



Let /3fat := /3fat{5, M, fi), where S = 



Theorem 5. If F is the self- concordant barrier of K with complexity parameter v , presented in 
the format of Subsection 1.3, 

1 



Pfat = ^ 



'nu 
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Proof. 

Lemma 3. 1. ds{x,y) < 2(1 + 3z^s)n(i//(3;, y). 

2. dh{x,y) < y/rwfidH{x,y). 

3. di{x,y) < ^/rhdH{x,y) 

Proof. For any z on the segment xy, dn^x, z) + dn^z, y) = dnix, y). Therefore it suffices to prove 
the result infinitesimally. By Lemma 2 

hm = 1, 

y^x \\x - y\\x 



and a direct computation shows that 



hm > 1. 



y^x \x - y\x 

Lemma 3 fohows from Theorems 7 and 8. □ 
Theorem 5 follows from Theorem 6 and Lemma 3. □ 
Theorem 6 (Lovasz, [16]). Let Si and S2 be measurable subsets of K. Then, 
vol{K \Si\ S2)vol{K) > (^e'^H{S^,S2) _ vol(5i)vol(52). 

For X G K and a vector v, \v\x is defined to be suPq,{j; ±av £ K}. 

Theorem 7 (Theorem 2.3.2 (iii), [23]). Let F be a self-concordant barrier whose self- concordance 
parameter is Vs o.s defined in Section 2. Then, for all h G M" and x G int{K) 

\h\x < <2(l + 3z/,)|/i|^.. 

The following result is implicit in [9]. 

Theorem 8 (Giiler, [9]). Let — lnp(x) be a hyperbolic barrier for K , where p has degree u^. Then, 
for all /i G M" and x £ int{K), 



\h\x < ^D^F{x)[h,h] < ^\h\ 



6 Analysis of the mixing time 

We denote the marginal distribution of Xj+i given Xi = xhy Px- Lemma 4 is a statement about the 
concentration of derivatives of odd order in high dimension. It will be used in the proof of Lemma 
6. Lemma 5 states that if the unit Dikin ellipsoid around a point contains the unit ball, then 
the points at which a random line through x chosen from the distribution induced by the uniform 
measure on the unit sphere intersects the boundary are, with high probability, at a distance ^l*{^/n) 
from X. This Lemma is used in the proof of Claim 3. 

Lemma 4 (Concentration bound). Let h be chosen uniformly at random from the unit sphere 
5" = {u I ||u|| = 1}. Then, for any odd k, 



D''F{x)[h,...,h]> ke sup D''F{x)[v, . . . ,v] 



< exp 
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If F is a self-concordant barrier, and 

yv, \\vf > D'^F{x)[v,v] 

when k = 3, this simplifies to 

P [D^F{x)[h,h,h] > 3e] < exp 

Proof. The "Bernstein inequality" of Gromov (Section 8.5, [8]) which appUes to multivariate poly- 
nomials restricted to 5", states that for any polynomial p on S"" of degree k, 

sup ||gradp(/i)|| < k sup 

h&S" h&S" 

For any fixed x, D^F{x)[h, . . . , /i] is a polynomial in h of degree k. Therefore 

D''F[x)[h,...,h] 
^sup||^ll^<i \D^F{x)[v, ...,v]\ 

is 1-Lipschitz on S*". If k is odd, D^F{x)[h, . . . ,h] = —D^F{x)[—h,...,—h]^ and therefore its 
median with respect to the uniform measure a on the unit sphere is 0. The first part of the lemma 
follows from the measure concentration properties of Lipschitz functions on the sphere (page 44 in 
[1]); namely, if / is an 1-Lipschitz function on the unit sphere and M is its median, then 

a{p> M^e)<e 2" . (4) 
When F is a self-concordant barrier, the second claim in the lemma follows because 

■s,w^ D'^F{x){v,v,v\< sup D'^F{x){v,v,v\<\. 

\\v\\<\ INIU<1 

□ 

Lemma 5. Let P be a polytope and x a point in it. Let the Dikin ellipsoid at x with respect to the 
logarithmic barrier at x contain the unit ball. Let v be chosen uniformly at random from the unit 
ball centered at x and I be the line through x and v, and p and q be the two points of intersection 
of i with the boundary dP. Then, for any constant a > 0, 



n 

mm(||p||, ll^ll) < 



2{a + 2lnn) 



< 2e~". (5) 



Proof. Without loss of generality, we may assume x to be the origin. The unit ball is contained in 
the Dikin ellipsoid and so P can be expressed as fXiLi{aJx < 1}, where 

IhY^aiaf. (6) 

i 

Examining the trace and the norm on both sides of (6), we obtain 

Vi, ||aj|| < 1 

and 

m 
i=l 
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We note that 



mm(||p||, \\q\ 
Thus, it is sufficient to show that 



mm a,- x\ 



-1 



n 



< 2e" 



(7) 



(8) 



which we proceed to do. Let S be the subset of [m] := {1, . . . , m}, consisting of those i for which 
llotll > Clearly, if for some i, (afv)'^ > ; then i £ S. By (6), \S\ < n^. Thus, by the 



union bound, 



n 



< n sup I 



(af.)2>2(« + 21nn) 



n 



(9) 



We note that, by (4), for any vector w with norm less or equal to 1, 



rp /2(a + 21nn) 



n 



< 



and so 



n 



< 2e" 



(10) 



(11) 



□ 

Proof of Theorem 2. In order to obtain mixing time bounds, we will ffist prove that if two points 
X and y are nearby in that d{x,y) < 0{^), then the total variation distance between the corre- 
sponding marginals Px and Py is < 1 — ^2(1). 

Without loss of generality, let x be the origin (which is achievable by translation), and for 
any v, let D'^F{0)[v,v] = (which is achievable by an affine transformation of K). 

For X / y. 



l-dTviPx,Pv 



mm 



Gyjz) G,{x) GM 
' Gx{zy GxizY Gxiz) 



(12) 



where the expectation is taken over a random point z from the density Gx and 

mind ^'^"^ ^'^""^ ^'^y^' 



Gx[zy Gx{zy Gx{z) 

is defined to be if z K. 

We will use the following fact (see Section 2.2, [25]) with D'^F in the place of M. 

Fact 1. Let M[hi, . . . , h^] be a symmetric k-linear form on M". Then, 



M[hi,...,hk] < \\hi 



sup M[v, . . . ,v]. 
INII<i 



Fact 2. Let the eigenvalues of the covariance matrix of a an n— dimensional Gaussian g be bounded 
above by A. Let < •, • > be an inner product and v E M" Then, E[< v,g >'^] < X < v,v >. 
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6.1 Relating the Markov Chain to the manifold 

We will frequently make statements of the form 

ngix) > 0{f{x))] < c. 

By this we mean, there exists a universal constant C such that 

F[g{x) > Cf{x)] < c. 
We will use the following fact repeatedly: 

Fact 3. Suppose x ^ z is a transition of the Dikin walk, then, 

1 



\x ■ 



z < 



> 1 - 10 



-3 



This can be ensured by setting the value of r to be a sufficiently small constant. 

Finally, we will frequently make use of the facts from (Theorem 2.1.1, [23]) stated below that 
Dikin ellipsoids vary smoothly, and that they are contained in the convex set. 

• Given any self-concordant barrier F, for any y such that 

D'^F{x) [x — y^x — y] = < 1, 

for any vector h £ M", 



1 - rfD'^F{x)[h, h] < D'^F{y)[h, h] < 



'1 



;D^F{x)[h,h]. 



(13) 



The Dikin ellipsoid centered at x, having radius 1, is contained in K. 



For two probability distributions Px and Py, let dxviPx, Py) represent the total variation distance 
between them. 

Lemma 6 (Relating d to Markov Chain). If x,y G K and d{x,y) < then dTv{Px,Py) = 

1-0(1). 

Proof. Without loss of generality, we may assume that F, F and F are strictly convex. In case any 
one is not, we can add the strictly convex logarithmic barrier of a sufficiently large cube, thereby 
making an arbitrarily small change to its second, third and (if it is not i^,) fourth order derivatives 
uniformly over K. Due to affine invariance, without loss of generality, let < u,v >x'=< u, v >, the 
usual dot product. As defined in Section 3, for any z £ K, 



V{z) 



1 



det D'^F. 



By Lemma 2, it suffices to prove that there is an absolute constant C such that if x,y € K and 
ll^; — < then dxviPx, Py) = 1 — ^{1). Without loss of generality, we assume x is the origin 
and we drop this subscript at times to simplify notation. 

Gyiz) G,{x) GM 



l-dTv{Px,Py 



min 1 



' Gx{z)' Gx{z)' Gxiz) 



(14) 



where the expectation is taken over a random point z having density Gx- Thus, it suffices to prove 
the existence of some absolute constant c such that 



mm 



Gyjz) G.jx) GM 
Gx{z)' Gx{z)' Gx{z) 



> c 



n(i). 
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This translates to 

P [max {n\\y - z\\l - r'^V{y),n\\z\\l - rV(z),n||z - y\\l - rV(z)) - n\\zf < 0{r^)] = 

We will prove the following lemmas. 
Lemma 7. -V{y) < 0(1) 
Lemma 8. 

9 



[-y(z)<o(i)] > 



10 



(15) 



Next, we will prove the following probabilistic upper bound, thereby completing the proof. 
Proposition 1. 



1, 



max {\\y - ||z||^, \\z - y\\f) - \\zf < O(-) 



> 



199 
1000' 



(16) 



Proof. Since ||y|| < 0{-^) and \\z\\ < ^ with probability greater than 1 — 10 ^, \\y\\y and \\y\\z are 
O(^). So it suffices to show that 



max \\z\ 



\zf,< y,z >y, \\z\\l - \\zf,< y,z >z) < O(-) 



n 



> 



10' 



(17) 



This fact follows from the following three lemmas and the union bound. The proof of Lemma 9 
would go through if ^ were replaced by ^ — r2(l). 



Lemma 9. 



Lemma 10. 



Lemma 11. 



max{\\z\\l-\\zf,<y,z>y,\\z\\l-\\zf,<y,z>,)<0{-) > — . (18) 



max ( llzll^ - ||z||^, <y,z >y, ||z||^ - ||z||^, <y,2; >2 ) < O(^) > — . (19) 



10 



max(^\\z\\l-\\zf,<y,z>y,\\z\\l-\\zf,<y,z>zj<0{^) > —. (20) 

□ 

□ 

Proof of Lemma 7. Let F := F-F. Fox w e K, let X{w) := D'^F{w) - D'^F{0). By Lemma 12 in 
[11], for any point w € K, the gradient of TrX{w) measured using || • ||^ is < Therefore, the 

gradient of TrX(z«) measured using ||-|| is < 0{^/n). X{0) = 0, therefore, |TrX(j/)| < 0(||y||y^) = 
0(1). 

||y|| = 0(1/V^), therefore, ||X|| = 0(||y|| sup|,^|,<„j^|, \\D^F{w)\\) = 0{1/^). 

For weK, let Y{w) := D^F{w) - D^F{0). 
Then 

\\Y{y)\\=D^F{y)-D^F{0) = O{\\y\\ sup \\D^F{w)\\) = 0{l/n). 

\M\<\\y\\ 
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Therefore, |Try(y)| = 0(1). 



-V{y) = -lndet(I + X + y) (21) 
= -Tr{X + Y + R), (22) 

where i? is a matrix whose 11 • 11 — )• 11 • 11 norm ||i?|| is bounded above by 0(max(||Xf , )) = 0(1). 
Thus, |T^(y)| = 0(1), and Lemma 7 is proved. □ 

Proof of Lemma 8. Let F := F - F. For w G K, let W{w) := D'^F{w), and let Z{w) := D'^F{w). 

V{z) = ^lndet{W{z) + Z{z)) 

= ^ {Tr\n{W{z) + Z{z))) with probability > 1 - 10"^ (i.e. if \\W{z) + Z{z) - I\\ < 1) 

= -^{TriWiz)-WiO)) + Tr{Ziz)-Z{0))) (23) 

+ ](TriWiz) + Ziz)-lf) (24) 



4 

- O (^\Tr {I -{W{z) + Z{z))f\^ with probability > 1 - 10"^ (25) 

The lemma will follow from the following claims: 
Claim 1. 

P[Tr(Z(z)-Z(0))<O(l)]>^. 

Claim 2. 

F[TriW{z)-Wm<0{l)]>^. 

Claim 3. 

|Tr (/ - {W{z) + Z{z))f I < 0(1)] > ^. (26) 

□ 

Proof of Claim 1. Let Zh{w) := D^F{w) and Zs = D^F{w). Then, Z{w) = nZh + n^Zs. Next, 

TrZ,(z) - TrZ,(0) = Z)TrZ,(0)N + D'TrZ^{z')[z,z] ^ ^^7) 

for some z' G [0, z]. 



|Z)TrZ,,(0)[z]| < 0\^ys\XY> \D^F[v,v,v\\jj with probability > 1 - 10"^ (28) 

< oini sup \D^F[v,v,v\\ I I with probability > 1 - 10"^ (29) 

V \S\v\\=l/^ J ) 

< 0{l/^/n) with probability > 1 - 10"^ , (30) 
which is at most 0{y/n). Applying Lemma 4, we see that 

998 

P[-I)TrZ,(0)[z]<O(l/n)]>— . (31) 
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D'^TrZ{z')[z,z] = D'^TrD'^F{z')[z,z] (32) 

In order to bound the above quantity, let A be an invertible matrix such that Z{z') = A^Z{0)A. 
Such a matrix A exists for which ||^ — /|| = 0{l/^/n) with probability > 1 — 10~^ because \\z'\\ = 
0{l/^/n) with probability > 1 — 10^"^. Let Da be the differential operator whose action on a 
function G is determined by the relation 

e M", DaG{w)[v] := DG{w)[Av]. 

Thus D\F{z') = Z/,(0). Now, 

ijull = 0(1) ^'i\\v\\ = l,D'^F{z')[u,u,v,v] < 0(1). (33) 



D^TrZhiz'))[z,z] = D^(TrD\F{0)yz,z] 

< n sup 0(1)^^(^)^,^^,2,-2]) with probability > 1 - 10"^ 

\\Av\\=l 

< sup /^^[i;, i;, t;, t;] (by Fact 1) 

\\v\\=l/^ 

= 0(l/n) with probability > 1 - 10"^ . 
Therefore, by Equations 27, 31 and 34, we have 

997 



(34) 



iTrZh{z) - TrZhm < 0(l/n)] > 



1000 



with probability > 1 — 10 ^ , ||z|| = 0{l/n), therefore with probability > 1 — 10 

TrZ,(z) -TrZ,(0) = O(TrZ,(0)yz||) 
= 0(l/n2). 

The claim follows from the last two sentences, since Z = nZ^ + nPZ^. 



-3 



Proof of Claim 2. 



TrW{z) - TrTy(O) = DTrW{0)[z] + 



D'^TrW{z') 



(35) 
(36) 

□ 



(37) 



for some z' G [0,z]. Lemma 12 in (Kannan and Narayanan, [11]) shows that ||VTrVF|| < 2-^/n. 
Since for all vectors v, \\v\\ > ||f ||, this implies that ||VTrT4^|| < 2-y/n. By Lemma 4, this implies 
that 

P[DTrVF(0)[z] < 0(1)] > 1 - 10~^. (38) 
By Lemma 13 in (Kannan and Narayanan, [11]), ^ TrW{z ) ^ thereby completing the proof. □ 
Proof of Claim 3. In order to prove that 



\Tr {I -{W{z) + Z{z))f\ <0(1) 



> 



99 

Too' 



it suffices to show that 



and that 



\\{W{z)-W{Q))\\ < 0{n~^l^] 



{Z{z)-Z{m\<0{n-'"] 



> 1 - 10" 



> 1 - lO"''. 



From Lemma 5 and Theorem 1, we obtain (39). We obtain (40) from (13). 



(39) 

(40) 

□ 
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6.2 Regularity of the metric defined by the Logarithmic barrier 

Proof of Lemma 9. Let F{w) := — Xli^i < a^, v >), for any v £ K. Thus 



Fixing an orthonormal basis with respect to < • , • > , 

m 



Il2 ^ II ||2 

v\\ < \\v\\ . 



i=l 



where X signifies that Y dominates X in the semidefinite cone. 

Recall that for any v such that \\v\\ = 1, IE(< v,z >^) = < 1/C for some sufficiently large 
constant C. It suffices to prove the following two inequalities. 



Lemma 12. 



Lemma 13. 



max (llzll^ - ||z||^, <y, z >y, <y, z >^ < O(-) 



> 



19 
20' 



\zf<0{- 



n 



> 



20 



Proof of Lemma 12. 



I _ n ||2 



D^F{y)[z,z\-D^Fmz,z\ 
= D^F{w)[y,z,z], 

for some w G [0,y] and consequently WwW = 0{l/^/n) and hence 

D^F{w)[y, z, z] = {l + o{l))D'F{d)[y, z, z]. 



(41) 

(42) 

□ 



(43) 



E 



^{y'^ aiaj z){a^ z) 



Therefore, 



P [D^F{<d)[y,z,z\ < 0{l/n) 



< E 



E 



,i=l 



|y(^a.af)f||zr/n 



0(l/n2). 



-2X;(af2/)(afz)2<l/n) 



i=l 



> 1 - O n^E 



^{y^ aiaj z){aj z) 



> 1 - 10"^ By (44) 



(44) 
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Proceeding to the next term, 

¥[<y,z>y=0{l/n)\ = ¥[<y,z> +{<y,z>y-<y,z>)=0{l/n)] 
<y,z>y -<y,z>= 0{\\z\\/n) 

by (13), and 

E[<y,z>2] <0(l/n2), 

so we obtain 

P [<y, z >y= 0(l/n)] > 1 - 10^1 

Finally, 

<y, z >, -<y, z >= D^F{w)[y, z, z] 

for some w £ [0,y] (and hence \\w\\ = 0{l/^/n)). 

From Equations (47), (45) and (45), it follows that 

F[<y,z>,= Oil/n)] > 1 - 2(10-3). 

Lemma 12 follows from Equations (45), (46) and (48). 

Proof of Lemma 13. In order to prove that 



\zf<0{^] 
n 



> 



20 



it suffices to show that 



+ \\zf_,) /2<\\zf + 0i-) 

n 



> 



10' 



because the distribution of z is symmetric about the origin. 



E 



+ 



2(1 - a/ z)2 2(1 + aj z 



(1 - {aJzYY 



i 
i 



(1 - (af z)2)2 
3{ajzr - {ajzf 



The probability that \aj z\ > n~i is 0(6^^/^). \aj z\ is < ||a?^||r, which is less than 
allows us to write with probability > 1 — 10"^ 

-^aJzY-iaJzf- 



E 



(1 - {aJzYY 



3E[(afz)4](l + o(l)), 



which is 0(||ai||^/n2). Since Yli^i'^J ^ Vi, ||aj|| < 1 and Y2i ^ 



Therefore, 



]z\\l + \\z\\l,)/2<\\zf + 



100, 



n 



E 



3(af z)' - {aJzf 



< ||ai||V(100n2 

i 

< llaillV(lOOn) 

< 1/100. 
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6.3 Regularity of the metric defined by the Hyperbohc Barrier 

Proof of Lemma 10. We will prove upper bounds on each of (a) ||z||^ — (b) < y,z >y, (c) 

INIIz ~ IkiP ^-^d (d) <y,z >z that hold with constant probability, and then use the union bound. 
We will repeatedly use the observation (that holds from Fact 2) that for any point w such that 

Ikll = o{i), 



and with probability > 1 — 10 



(a) 



|y|U<0(n-i/2||y||)<of- 



z\\w < 0{n-'^/^\\z\\) < 0(1/V^) 



I l|2 _ II ||2 
\Z\\y ll^ll 



D^F{y)[z,z]-D^F{0)[z,z] 
D^F[w][y,z,z], 



for some w on the line segment [0,y]- 
By Fact 1, 

^D^F{w)[y,z,z] < 0{n 



-2^ 



> 



|y|U(||^IU)'<0(^) 



Since = ||z|p + n||z|p + n-^Hzp, we have 

||z||^ = 0{\\z\\^u/^/ri). 
Also, = 0(||?/||/-y/n) = 0(l/n), and so with probability > 1 — 10~^ , 

0(]\z\U) = 0{\\z\\/Vn) = 0{l/Vn). 

Thus, 



n\/n 



and 



\y\\wi\\z\Uf < 0{^) 



> 1-10" 



> 1-10" 



(b) 



<y,z>y = <y,z > +{<y,z >y -<y,z >) 

= <y,z> +D^F{w)[y,y,z] (for u; G [0, y]) 

= 0( — ) + 0(|jy||^i^|U) with probability > 99/100. 
n 



(56) 



(57) 



(58) 



(59) 



(60) 



(61) 
(62) 



In going from (61) to (62), we used Fact 1 and Fact 2. In the above calculation, to ensure 
that w is well-defined, we take it to be the candidate with the least norm. Thus by Equations 
(59), (60) and (62), 



{<y,z>y) < O(-) 



> 



\y\\\\z\ 



n 



+ \\y\\wM 



o 



n 



(63) 



> 98/100. 
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(c) For some w G [0, z], 



\z\\i- \\z\\^ - D-^F{0)[z,z,z]\ < sup 

w£[0,y] 



.D*F{w)[z,z,z,z] 



(64) 



By Lemma 4, 



D^F{0)[z,z,z] = O 



snp^^,^^^^^D^F{0)[v,v,v]\\z\ 



n 



> 99/100. 



|z|| = 0{l/^/n)] > 99/100, therefore, each term in (64) is 0(l/n^) with probability 



and so 



z\\l-\\zf<0{^) 



> 



98 

Too" 



(d) 



<y,z>z = <y,z > +{<y,z -<y,z >) 

< <y,z>+ sup \D^F{w)[y,z,z] 

we[o,z] 



or—) + iylLi^ll!, with probabihty > 99/100. 
n 



By Equations (65) and (66), 



\y\\ : n.ii2 " / 1 



n 



+ llylUII^L = o 



> 99/100. 



Therefore, 



{<y,z>,) < O(^) 



> 



100' 



□ 



6.4 Regularity of the metric defined by the self-concordant barrier 

Proof of Lemma 1 1 . We trace the same steps involved in the proof of the last lemma, the only 
difference being that of scale. We proceed to prove upper bounds of 0(l/n^) on each of the terms 
(a) ll^lly — (b) <y,-2 >y, (c) \\z\\1 — and (d) <y,z >z that hold with constant probability 
separately, and then use the union bound. We will repeatedly use the observation (that holds from 
Fact 2) that for any point w such that = o(l), 

\\y\U<0{n-^\\y\\)<o(^\ (65) 

and with probability > 1 — 10^^ 

\\z\U<0{n-^\\z\\) <0{l/n). (66) 
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(a) 



\Z\\n, — \\Z\ 



D^F{y)[z,z]-D^Fmz,z] 
D^F{w)[y,z,zl 



for some w on the line segment [0, y]. 



D'^F(w)[y,z,z] < \\y\U\z\\l by Fact 1 

< (n"2/2)(-^-i)2 ^-^j^ probability > 1 - 10"^ 



(b) 



<y,z >, 



<y, z > +i<y, z >y -<y, z >) 
<y,z> +D^F{w)[y,y,z] 



O 



n\/n 



+ O (llyll^ll^ll^) with probability > 99/100. 



In going from (61) to (62), we used Fact 1 and Fact 2. We see that 

1 



+ 



\z\\,n = O 



n\ n 



Therefore, 



(<2/,Z >y) < O(^) 



> 



> 99/100, 



98 

Too' 



(c) 



= D'^F{z)[z,z]- D^F{0)[z,z] 



D^F{w)[z,z,z] 



for some w on the line segment [0,z]. By Fact 1, 

D^F{w)[z,z,z] < \\z\\ 

w\\Z\\u} 



< O ( ^ 1 with probabihty > 1 - 10~ 



(d) 



<y,z>z = <y,z > +{<y,z >^ -<y,z >) 
< <y,z>+ sup D^F{w)[y,z,z] 



rix/n 



we[0,; 

+ ijylUP-^ll^ with probability > 99/100. 



By Equations (65) and (66), 



Therefore, 



n 



1 j^j 



{<y,z>,) < O(^) 



> 



> 99/100. 



100' 
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6.5 Bound on Conductance 

Lemma 14 (Bound on Conductance). Let fi be the uniform distribution on K . The conductance 

$ := mf — 

of the Markov Chain in Algorithm 1 is 0,(f3fat)- 

Proof. Let Si be a measurable subset of K such that fJ.{Si) < \ and 52 := K\Si be its complement. 
For any x ^ y ^ K , 

dPy , , dPx , N 

Let S{ = Sxr\ {x\Px{S2) = o(l)} and 5^ = ^2 n {y\p{y)Py(^S{) = o(l)}. By the reversibility of the 
chain, which is easily checked, 



I P,(52)d/i(x) = / Py{S{)dii[y). 

JS^ JS->. 



'Si J S2 

U X £ S'l and y £ S'2 then 

dTviPx, Py) ■=^- ™™ (^^'^^' ^^"^0 '^'"^'^^ " ^ ~ 

Lemma 6 implies that if d{x, y) < ^v^, then dTviPx, Py) = 1 — r2(l). Therefore Theorem 5 implies 
that 

f,{{K \ S[) \ S',) > niPfat) min(/.(5l), /i(5^)). 
First suppose n{S[) > (1 - J7(l))/x(5i) and > (1 - 0(l))/i(52). Then, 

/ P.{S2)dfi{x) > fiiK\S[\S'2) 
JSi 

> f^(/3/a0min(//(5i),;u(5'2)) 
and we are done. Otherwise, without loss of generality, suppose ij,{S[) < (1 — r2(l))/x(5i). Then 

PAS2)dfi{x) > n{fi{Si)) 



and we are done. □ 
6.6 Mixing Bounds 

Let e be the first time the Markov chain escapes xq = 0. Thus e is an integer valued random 
variable defined by the event that y^ is the first point in yi, ■ ■ ■ that is not equal to yo- Let the 
density of ye be pe- Vt, we know that F[e > i + l|e > t] < 1 — 0,(1), therefore 

sup^ = 0(1). (74) 
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Therefore 



/ Peixfdx = 0{ [ Geixfdx) (75) 

Jk JR" 

= exp (0(nln(nzv))) . (76) 

Together with the fohowing theorem, settmg f(x) := p{xe) — (vol(-fir))~^ this completes the proof 
of mixing from a fixed point. The proof of mixing from a warm start follows similarly. 

Theorem 9 (Lovasz-Simonovits [17]). Let /xq be the initial distribution for a lazy reversible ergodic 
Markov chain whose conductance is <I> and stationary measure is fi, and fik be the distribution of 
the k^^ step. Let M := supg ^^ll^s) "^^^f^ supremum is over all measurable subsets S of K. For 

every bounded f, let ||/||2,^ denote f{x)'^dfi{x) . For any fixed f, let Ef be the map that takes 

^ Ik f(y)'^P'Ay)- Then, 



1. for all S, 



2. If J^f{x)dp{x) = Q, 



\E'fh,, < (l 



$2 

~2 



2,11- 

□ 



7 Mixing in a direct product of convex sets 

Our analysis hinges upon a lower bound on the Cheeger constant f3fat, which is obtained by com- 
paring the isoperimetry of the weighted manifold Ai obtained by equipping K with the metric from 
the Hessian of F, with the isoperimetry of the Dikin metric on the /i— dimensional cube [— f , ^]^, 
with respect to the barrier Fn(xi, x^) := — In cos(xi) (see Section 6.2, [27]). 

For a manifold M equipped with a measure fi and metric d, let the Minkowski outer measure 
of a (measurable) set A be defined as 

p^A) :=liminf ^^-^-)-^(-^\ (77) 

where A^ := {x\d_M{x, A) < e}. 

Definition 1. The (infinitesimal) Cheeger constant of the weighted manifold (A/", /u) is 

Pn= inf (78) 
where the infimum is taken over measurable subsets. 

The isoperimetric function of p is the largest function such that ijl^{A) > I^i{fi{A)) holds for all 
Borel sets. 

Let the /i— fold product space x • • • x A/" be denoted M^, where the distance between points 
{xi,...,Xh) and {yi,...,yh) is ^JY.id{x^,yiY . 

We will need the following theorem of Bobkov and Houdre (Theorem 1.1 [5]). 
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Theorem 1. For any triple {M,d,fi) as above, 



^ ^^/^- (79) 



We will also need the following theorem, which is a modification of Barthe (Theorem 10, [2]), 
obtained by scaling the metric on M by 

Theorem 2. Let k > 2 be an integer. For i = 1, . . . ,k, let (Xj, di,jii) be a Riemannian manifold, 
with its geodesic distance and an absolutely continuous Borel measure of probability and let Aj be a 
probability measure on M with even log-concave density. If I^^ > ^ for i = 1, . . . ,k, then 

T ^ -^Ai(gi---(giAfc 

The following lemma allows us to relate the "fat" Cheeger constant Pfat with the infinitesimal 
version (3_m. 

Lemma 15. Let A, B M and dj^(A,B) > 5m- Then, 

fi{M\{AuB}) > 2min(/i(A),/x(S))(e^^ -1). (80) 

Proof. We will consider two cases. 

First, suppose that max(/i(A), /i(i?)) > Without loss of generality, we assume that fi{A) < 
fx{B). Then, let 

di := sup 5. 

We proceed by contradiction. Suppose for some (3 < Pm , 

36e[0,6i),fi{As) <e^^fi{A). (81) 

Let 6' be the infimum of such 5. Note that since fJ^^A^) is a continuous, monotonically increasing 
function of 5, 

f,{As>) = e^''^^{A). 



However, we know that 



^^+{^As>) := liminf ^^^'^ ^^^^ > (3m, (82) 

e^0+ e 



which contradicts the fact that in any right neighborhood of 5' , there is a 5 for which (81) holds. 
This proves that for ah 6 G [0,5i), ^(A^) > e^^-^ij{A). We note that As, n Bsj^s, = 0, therefore 
fJ-i^M — <5i) < ^- So the same argument tells us that 

> e^^('^^-^i)/i(S). (83) 

Thus, n{M \{AU B}) > /x(A)(e'^^('^^-''i) + e^^-'i - 2). This implies that 

fi{M \ {A U B}) > 2fi{A) (e^^ - 1 



Next, suppose fJ.{B) > ^. We then set 5i := 6m, and see that the arguments from (81) to (83) 
carry through verbatim. Thus, in this case, H{M \{AU B}) > mm{n{A), n{B)){e'^^^^ - 1). 

□ 
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This immediately leads to the following corollary: 



Corollary 1. 



Nesterov and Todd show in (Lemma 4.1, [27]) that the Riemannian metric Ai on the direct 
product of convex sets induced by X]f=i same as the direct product of the Riemannian 

metrics Mi induced by individual Fi on the respective convex sets Ki. 

Proof of Theorem 3. By Corollary 1, it suffices to show that = ^{^). We will show this using 

Theorem 2 and Theorem 1. Consider the /i— dimensional cube [— f , f ]'', and the metric from the 
Hessian of the barrier Fn(xi, . . . ,Xh) '■= — X]j liic;os(xj) (see Section 6.2, [27]). The map 

tp : {xi, . . . ,Xh) (ln(sec(xi) + tan(xi)), . . . , ln(sec(2;ft) + tan(x/i))), 

maps the cube with the Hessian metric isometrically onto Euclidean space, and the push-forward 
of the uniform density on the cube is a density 4> ■ ■ ■ (f) on M'^, where 

cos ( 2 arctan ' ^ 

m = — ^ 

TT 

and 



d-^ In 6 4e 



2x 



<o, 



(ix2 (1 + e2^)2 

and the density is even (thus meeting the conditions of Theorem 2). It is easy to check that the 
barrier is 1— self-concordant (Section 6.2 [27]). Therefore, in the 1— dimensional case, is bounded 
above and below by fixed constants. This, together with Theorem 5 implies that for each Aii and 
the uniform measure /ij (on Ki), the isoperimetric profile of {Aii,^i) satisfies > 
Now applying Theorem 2 and Theorem 1 in succession, we see that 



where 1 is the constant function taking the value 1. Therefore, 



□ 



8 Analysis of Las Vegas Algorithm 

Proof of Theorem 4- In contrast with the Markov chain used for linear programming in [11], this 
Markov chain is not ergodic and has no stationary probability distribution. We will analyze its 
behavior up to time t by relating this to the limiting behavior of Dikin walks on a family of convex 
sets {K^}j>i each contained in the next, such that = K n{x\c^x < j}. Note that for any fixed 
j, our mixing results from Theorem 2 apply since is bounded. Let Fj = F + ln{j — c^x). By 
known properties of barriers ([23]), the self- concordance parameter of Fj is at most 1 more than 
that of F. As j tends to oo, D^Fj converges uniformly to D^F in the 2—7-2 operator norm on any 
compact subset of K. Therefore, for any t, the distribution of the i-tuple (T(xo), T'(xi), . . . , T{xt)) 
is the limit in total variation distance of the distributions of t-tuples (T(xq), r(x'{), . . . , T{xl)) where 
(xq, . . . , x^) is a random walk on starting at 0. 
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We will now give an upper bound for F[c^{xi) < 1 — e]- Let yi := T{xi). Let e be the first time 
the Markov chain yQ,yi, . . . escapes yo- Thus e is an integer valued random variable defined by the 
event that ye is the first point in yi, . . . that is not equal to yo. Let the density of ye be pe- Vt, we 
know from Lemma 6 that ¥[e > t + l\e > t] < 1 — 17(1), therefore 

sup^ = 0(1). (84) 

Without loss of generality, in the rest of this proof, we assume that for ah v, D'^F{0)[v, v] = ||f Ip. 
Therefore 



K 



Peixfdx = 0{ / Goixfdx) (85) 
= exp(0(nln(n))). (86) 



Let pt{x) be the density of xt- Then, by Theorem 2 and the fact that as far as total variation 
distance is concerned, a random walk on K can be viewed as the limit of random walks on the 
as j — )• CO, 

J^Pe{xrdx-^^^'-Y^^- 

By Lemma 14, this is O (exp (^-^i^Ml! 
Lemma 16. Let p be a density supported on K such that 

p{x)dx > 5. 

Then, 



j p{x)'^dx > (5^ exp ^— O In 



Proof. Let be X H {c x < 1} and ■= TK^- Given four coUinear points a,b,c,d, {a : h : c : 

(a~d)'-(b~c) called the the cross ratio. Let j/ q' 9 be a chord of K^, and p = T^^{p') and 

q = T^^(q'). If c^p' < c^q', then |^ < |^ < s. On the other hand, if c^p' > c^q', let r be the 

intersection ofpq with {c^x < 1}. By the projective invariance of the cross ratio (see for example. 
Lemma 14 in [11]) 

(cxD : : p : q) = {r : : p : q) . 



Therefore 



\p'\ _ f\P_ 

's + l 



\q'\ \\Q\/ \\p- 



Thus 



< (^) 



sup In = O (in - 
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where the supremum is taken over ah chords of K containing the origin. By (89) and Theorem 7 
and Theorem 8, it foUows that 

sup ln||/i|| < O Mn 



and therefore 



InvolK, <0(nln(^^)) . (90) 



Let p be a density supported on K such that 

p{x)dx > 5. 



Then, 



p{xYdx > / pixydx (91) 
K Jkc 

> (92) 

VolKe 



> (5^exp(^-0(^nln(^— jjj . By (90). (93) 

□ 

Using the above Lemma 16, if 5 > P [c^xt > 1 — e] , we have 

tiPfatf < O (nln (^)) - InP [c^xt > 1 - e] . 

Therefore, 

P [c^xt < 1 - e] < exp (O (nln (^)) - t(/3/at)') . 

Therefore, 

F[{3t>T)c^Xr<l-e] < J^exp(o(nln(^)) -t(/3j,i)2) (94) 



t>T 



1 - exp (- (/3/at) 
Therefore, for any 6, P [{3t > r) c^Xr < 1 — e] < S for 



< exp(0(nln(f))-r(^;,,)^) _ ^^^^ 



T = { TTT^ (in i + fnln 



which together with Theorem 5 completes the proof. □ 
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